Build citation graph based on shared posts, mentions, and URLs referencing other channels
Identify next batch of channels
Retrieve all posts from these channels
Sampling techniques
Sampling from hidden network, constrained by rate limits -> How to prioritise?
Typically breadth-first search or exponential discriminative snowball sampling -> Problem: Bias towards high-degree nodes, smaller communities not represented
tg.observer crawls based Mutual Friend Crawling (Blenn et al. 2017) within all communities and languages \[
S_R = \frac{\text{degree of node f within community}}{\text{degree of node f in entire graph}}
\]